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ABSTRACT 

In this paper we deal with the problem of chromaticity, i.e. apparent po- 
sition variation of stellar images with their spectral distribution, using 
neural networks to analyse and process astronomical images. The goal 
is to remove this relevant source of systematic error in the data reduc- 
tion of high precision astrometric experiments, like Gaia. This task can 
be accomplished thanks to the capability of neural networks to solve a 
nonlinear approximation problem, i.e. to construct an hypersurface that 
approximates a given set of scattered data couples. Images are encoded 
associating each of them with conveniently chosen moments, evaluated 
along the y axis. The technique proposed, in the current framework, 
reduces the initial chromaticity of few milliarcseconds to values of few 
microarcseconds. 

Key words: astrometry - methods: numerical - techniques: image 
processing. 



1 INTRODUCTION 

The location of the position of a stellar image is pos- 
sible with accuracy well below its characteristic size, 
when the signal to noise ratio (SNR) is sufficiently 
high. The location uncertainty is a — a ■ L/SNR, 
where a is a factor keeping into account geometric 
factors and the centring algorithm, and L is the root 
mean square width of the image The best esti- 
mate of image position is obtained by a least square 
approach, evaluating the discrepancy between the 
data and the template describing the reference im- 
age. The location algorithm is therefore very sensi- 
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tive to any variation of the actual image with respect 
to the selected template. 

It is necessary to check the compatibility be- 
tween the real image and the reference profile; also 
important is the capability of extracting from the 
data a set of parameters suitable for a new definition 
of the template, in order to improve its consistency 
with the data. Self-calibration of the data, by deduc- 
tion of the parameters for optimisation of the image 
template, is a key element in the control of the sys- 
tematic effects in the position measurement. 

In particular, the individual spectral distribu- 
tion of each object results in a signature on the im- 
age profile, due to diffraction, above all in presence 
of aberrations. Because of these reasons, our target is 
the implementation of a tool for analysis of realistic 
images. 
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Attempts to use neural networks (NN) in astron- 
omy have been performed in the past, mainly in the 
field of adap tive optics: details can be found e.g. in 
0) and 

In Section [5] we discuss the image characteri- 
sation problem addressed in the present work; in 
Section El we resume the main features of sigmoidal 
NN and backpropagation algorithm, with a brief re- 
minder of the specific definitions, and in Section^Jwc 
describe the generation of the data set, its processing 
and the current results. 
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Table 1. The 21 lowest order Zernike polynomials 



DIFFRACTION IMAGING 



The image of a star, considered as a point-like source 
at infinity, and produced by an ideal telescope, is 
derived in basic textbooks on optics. For an unob- 
structed circular pupil of diameter D , at wavelength 
A, it has radial symmetry and is described by the 
squared Airy function Q (see (Q) for notation). 



I(r)=k[2J 1 (r)/r] 



(1) 



Here Ji is the Bessel function of the first kind, order 
one, k a normalisation constant, and r = D/2 the 
aperture radius. The Airy diameter, between the first 
two minima, is 2.44A/D in angular units; the linear 
scale is defined by the focal length. 

The diffraction image on the focal plane of any 
real telescope, described by a set of aberration val- 
ues, for a given pupil geometry, is deduced by the 
square modulus of the Fourier transform of the pupil 
function e l * : 



I(r,cj>) = 



k 



dp / dOpe 1 ® 



i&(p,6) —iivrp cos(0 — 0) 



(2) 



where {r, <f>} and {p, 6} are the radial coordinates, 
respectively on image and pupil plane, and the in- 
tegration domain corresponds to the pupil: for the 
circular case, ^ p ^ 1; ^ ^ 2tt. In case of a 
rectangular pupil, it is more convenient to use carte- 
sian coordinates on both image and pupil plane, e.g. 
{x, y} and {£, rj} , respectively, integrating between 
the appropriate boundaries £2] ; [771, 172] ■ 
The phase aberration <£> describes for the real case 
the deviation from the ideal flat wavefront, i.e. the 
wavefront error (WFE), and is usually decomposed 
by means of a set of functions (e.g. the five Seidel 
classical aberrations or Zernike functions, whose first 
21 terms are listed in Tab.0: 



&(p,6) = yWFfi^ ^jrA n <l> n (p,( 



(3) 



If $ = (non-aberrated case, {A„} = 0), we ob- 
tain a flat wavefront, i.e. WFE — 0, and Eq. Q is 
retrieved for the circular pupil. 

The nonlinear relation between the set of aberra- 
tion coefficients A n and the image is put in evidence 
by replacement of Eq. (J^J in Eq. In particular, 
the WFE is independent from wavelength, and wave- 
length dependence in the pupil function is shown by 
the 2-7r/A factor. 

The real polychromatic image of an unresolved stel- 
lar source is produced by integration over the appro- 
priate bandwidth of the monochromatic PSF above, 
weighed by the combination of source spectral dis- 
tribution, instrument transmission and detector re- 
sponse. Thus, objects with different spectral distri- 
butions have different image profiles, and the posi- 
tion estimate produced by any location algorithm 
(e.g. the centre of gravity, COG, or barycentre) is 
affected by discrepancy with respect to the nominal 
position from the image generated by an ideal optical 
system. 

The variation of apparent position with source 
spectral distribution is what we call chromatic- 
ity, and it is relevant to high precision astrometry 
because in normal telescope configurations it can 
amount to several milliarcseconds, inducing severe 
limitations with respect to the measurement goal. 
For example, in the Gaia mission (0), the individ- 
ual exposure precision for bright objects is of order 
of few ten microarcseconds. It is possible to use dif- 
ferent position estimators (e.g. least square methods 
rather than COG), and each procedure is affected by 
a specific spectral sensitivity. 
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The common misconception that reflective op- 
tics is "achromatic" is true in the sense that it is not 
affected by classical chromatic aberration, typical 
of refractive systems. However, chromaticity in the 
above sense is critical. Also, not all aberrations are 
relevant to chromaticity, but the relationship is not 
mathematically trivial; the critical terms introduce 
an asymmetry in the image, along the measurement 
direction, and are associated to odd parity functions. 
An analysis of chromaticity versus aberrations, opti- 
cal design aspects, and optical engineering issues, are 
discussed in a separate paper, in preparation, which 
also deals with design optimisation guidelines. After 
minimisation of the chromaticity by design and con- 
struction, the residual chromaticity must be taken 
into account in the data reduction phase. 

The aberration components are not easily mea- 
sured during operation. In principle, it is possible 
to use techniques developed in past works (0) for 
aberration reconstruction from the focal plane im- 
ages. This may be considered for future work, but 
given the number of aberrations terms and quickly 
increasing size of the data set of examples required 
for proper training, the computational load becomes 
quite large. 

Instead, in the current paper we are interested 
in the classification capability of a NN to implement 
identification of the chromatic effect from the image 
profile itself, and subsequent correction in the data 
reduction. The goal is a tool for chromaticity self- 
calibration throughout the mission, crucial with re- 
spect to high precision astrometry. We find that the 
image moments are convenient description parame- 
ters, as discussed below. 

The chromaticity is estimated as difference be- 
tween the COG of a blue (B3V) and red (M8V) stars, 
modelled as black bodies, with effective wavelengths 
628 nm and 756 nm respectively, deduced by taking 
into account also the telescope transmission and de- 
tector quantum efficiency. A set of aberration cases 
is generated for the basic telescope geometry of Gaia 
(i.e. 0.49 m off-axis, 1.4 x 0.5 m aperture), under 
the assumption of small image degradation, i.e. of 
reasonably good imaging performance, as desired for 
large field astronomical telescopes. The aberration 
coefficients are generated with a uniform random dis- 
tribution with peak value 50 nm for each component, 
using the Zernike formulation. The coefficient range 
is not specific of a given configuration, but represents 
all mathematically possible cases, i.e. a superset of 
the optically feasible systems. 



2.1 Image encoding 

To maximise the field of view, i.e. observe simulta- 
neously a large area, typical astronomical images are 
sampled over a small number of pixels. 

The minimum sampling requirements, related 
to the Nyquist-Shannon criterion, are of order of 
two pixels over the full width at half maximum, or 
about five pixels within the central diffraction peak. 
The signal detected in each pixel is then affected by 
strong variations depending on the initial phase (or 
relative position) of the parent intensity distribution 
(the continuous image) with respect to the pixel ar- 
ray, even in a noiseless case. The pixel intensity dis- 
tribution of the measured images, then, is not conve- 
nient for evaluating the discrepancy of the effective 
image with respect to the nominal image. 

It may be possible to add a sort of magnifying 
device, providing good sampling for the images in a 
small region: in this case, the resolution is adequate 
to minimise the effects of the finite pixel size (|j). In 
Gaia this would have an heavy impact on the pay- 
load, so that we focus on methods applicable directly 
to the science data. Even in case of well sampled 
images, we have to face some problems: assuming a 
sampling of 20 pixels per Airy diameter, and read- 
ing up to the third diffraction lobe, the image size is 
60 x 60 = 3600 pixels. Direct usage of such images as 
input data to the NN is impractical, because of the 
large computational load involved, and identification 
of a more compact encoding, using the science data 
rather than additional custom hardware, appears to 
be appealing. 

Since the Gaia measurement is one-dimensional, 
and most images are integrated in the across scan 
direction, the problem (and the signals considered) 
is also reduced to one dimension, conventionally la- 
belled y : the one-dimensional image is I (y) . The 
encoding scheme we adopt for the images allows ex- 
traction of the desired information for classification; 
each input image is described by the centre of gravity 
and the first central moments as follow: 

Hv= Jdyy-I (y) / I ini 

(Ty = fdy (y- fiy) 2 ■ I{y) / I ini ^ 

M(j)=Jdy (2^)' .I(y) / I int , j> 2 

where lint = J dy I (y) is the integrated photometric 
level of the measurement. 

One-dimensional encoding is a further change 



© 2005 RAS, MNRAS 000, CJ-?? 



4 M. Gai and R. Cancelliere 



with respect to previous problems, in which we took 
advantage of the full two-dimensional image struc- 
ture to deduce the different aberration terms. 
The central moments are much less sensitive than 
the pixel intensity values to the effects related to the 
finite pixel size and the position of the image peak 
with respect to the pixel borders, i.e. the relative 
phase between optical image and pixel array. Thus, 
central moments can be deduced conveniently also on 
the detected low resolution images, without the need 
for high resolution detectors. The encoding technique 
based on using moments as image description param- 
eters for neural processing was first introduced in , 
where more details are available. 




3 SIGMOID AL NEURAL NETWORKS 

Neural networks learn from examples that is, given 
the training set of N multi-dimensional data pairs 
{(xi, F (xj)) fa e R p ,F(xi) 6 R Q }, i = l,...,N, 
after the training if Xi is the input to the network, 
the output is close to, or coincident with, the de- 
sired answer F (xi) and the network has generaliza- 
tion properties too, that is it gives as output F (pa) 
even if the input is only "close to" Xi , for instance a 
noisy or distorted or incomplete version of Xi ; a com- 
prehensive review on NN properties and applications 
can be found in jjj). 

In our work we use the multilayer perceptron, 
first introduced in 1986 (see (ITol) '). as an extension of 
the perceptron model l|8|). 

The multilayer perceptron, with sigmoidal units 
in the hidden layers, is one of the most known and 
used NN model: it computes distances in the input 
space (i.e. among patterns Xi £ R p ) using a metric 
based on inner products and it is usually trained by 
the backpropagation algorithm. The architecture of 
a sigmoidal NN is schematically shown in Fig. in 
which we find the most common three-layers case. 
The network is described by Eqs.|S] 

a j +1 = Y. W ii'° k j> + biaS 3 
i' 

= a(a^) . (5) 

1 + e 3 
„out „out— 1 

i 

Here a is the input to each unit, o is its output 
and Wij is the weight associated to the connection 
between units i and j; each unit is defined by two 



Figure 1. A multilayer perceptron with one hidden layer 

indexes, a superscript specifying its layer (i.e. input, 
hidden or output layer) and a subscript labelling each 
unit in a layer. 

The training procedure is finalized to find the best 
set of weights {u>ij} solving the approximation prob- 
lem o(xi) ~ F (xi) and this is usually reached by 
the iterative process corresponding to the standard 
backpropagation algorithm. 

At each step, each weight is modified accordingly 
to the gradient descent rule (a more detailed descrip- 
tion can be found in (10)), completed with the mo- 
mentum term, Wij — Wij + Awij , Awij = —rj ^ 
where E is the error functional defined above. 
This procedure is iterated many times over the com- 
plete set of examples {xi,F (xi)} (the training set), 
and under appropriate conditions it converges to a 
suitable set of weights defining the desired approxi- 
mating function. Convergence is usually defined in 
terms of the error functional, evaluated over the 
whole training set; when a pre-selected threshold Et 
is reached, the NN can be tested using a different set 
of data {x'i, F (x'i)} , the so called test set. 



4 DATA PROCESSING AND RESULTS 

In this section we describe the identification of the 
most convenient image parameters, the generation 
of the training and test sets, and the results from 
the NN processing. The sources are represented by 
the monochromatic PSF at the wavelength of respec- 
tively 628 nm (B3V) and 756 nm (M8V), deduced 
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Figure 2. Distribution of chromaticity vs. RMS WFE 
over the test set. 
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Figure 3. Distribution of chromaticity vs. image COG 
over the test set. 



from the blackbody spectrum associated to the ef- 
fective temperature of each star, and the expected 
spectral distributions of instrument transmission and 
quantum efficiency. 

4.1 Aberration sample 

In order to investigate the relationship among im- 
age moments and chromaticity, we start from a rea- 
sonable sampling of the aberration space, using a 
uniform random distribution of the 21 lowest or- 
der Zernike coefficients (Tab 0, within the range 
±50 nm on each term. 

For each aberration case, defined by the set of 21 
Zernike coefficient values, we evaluate the RMS WFE 
for verification purposes and we build the PSF for the 
two source cases; on the PSF, the photo-centre posi- 
tion is evaluated as the COG, and the moments up 
to order five are computed accordingly to the def- 
initions in Eq. @, after across scan integration to 
replicate the Gaia measurement process. The chro- 
maticity is directly derived as COG difference. 
In Fig|5|we show the distribution of chromaticity vs. 
RMS WFE over the test set (5821 instances). At 
increasing values of the aberration RMS WFE, the 
chromaticity has usually larger absolute value, but 
the relationship is not simple; the same considera- 
tion holds for the relationship between chromaticity 
and any other image moment, due to diffraction non- 
linearity. Some statistical parameters of the distribu- 
tion of chromaticity and WFE values in the training 
data set are listed in Tab. 





Chromaticity 
[/ias] 


RMS WFE 

[nm] 


Min. 


-5289.3 


2.99 


Mean 


6.1 


18.71 


Max. 


5365.9 


46.26 


RMS 


1648.2 


6.81 



Table 2. Statistics over the training data set of chro- 
maticity and RMS WFE. 



The average RMS WFE corresponds to an over- 
all optical quality of about A /30 at 600 nm, i.e. a 
comparably good performance; some of the optical 
designs considered for Gaia provide a RMS WFE of 
about 40 nm, or A/15 at 600 nm. The chromaticity 
evaluated on the proposed designs has peak values of 
~ 2 — 3 mas, localised in specific field positions, and 
symmetric distribution for the nominal aligned con- 
figuration. The random data set considered is thus 
reasonably representative of a range of realistic opti- 
cal configurations. 



4.2 Neural network input 

We verify that the across scan moments (i in the 
Gaia reference frame) are all irrelevant, i.e. their ef- 
fect on chromaticity is negligible. Usage of the stan- 
dard one-dimensional science data is therefore appro- 
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Figure 4. Distribution of residual chromaticity vs. image 
COG, after slope subtraction. 



priate, without operation changes. The moments are 
all computed with straightforward operations from 
the measured data, as well as the variation with re- 
spect to the nominal moment values of a selected 
reference spectral type. 

Some of the along scan ( y ) moments do not show an 
apparent signature associated to chromaticity. A few 
of them are still required to provide an acceptable 
description of the image profile: the moment selec- 
tion was verified on the NN, removing some of them 
until reaching the minimum number of parameters 
compatible with good convergence of the training. 

From the data distribution, it appears that some 
pre-processing is recommended, in order to ease the 
subsequent neural processing. This is most apparent 
in the distribution of chromaticity with respect to the 
nominal image COG, shown in Fig|3]for the test set. 
The data points are distributed in three well-defined 
regions following parallel straight lines, shown in fig- 
ure by different colours. 

The chromaticity / COG structure is shown with 
even more clarity by subtracting the average slope, 
derived by linear fit on the central peak of the dis- 
tribution. The fit parameters are: 157.83 fias/mas 
(slope); 0.06 /J,as (offset). In Fig[I]we show the dis- 
tribution of the chromaticity residuals after subtrac- 
tion of the above straight line. The number of data 
instances in each side peak is about 9% of the sample. 
The peaks are quite symmetric and corresponding to 
±600 /ias. From the residuals, a finer structure ap- 
pears, which is not currently used in pre-processing. 



CO 



-5000 



5000 




58 60 62 64 66 68 70 
Image RMS width [mas] 



72 74 



CO 



-5000 



5000 



'-'■;*is;'i^::^;■ ;^ t^;:^.■^.;■■■.. ,v.. 



-0.8 -0.6 -0.4 



-0.2 0.2 
Skewness 



0.4 0.6 0.8 



CO 



-5000 



Subset 2 



Subset 3 



-0.2 




-0.1 





Skewness variation 



0.1 



Subset 1 



0.2 



Figure 5. Distribution of chromaticity vs. image RMS 
width (top), skewness (centre) and skewness variation 
(bottom). 



The classification of data instances in the chro- 
maticity/COG groups (subsets 1, 2 and 3) is taken 
into account in evaluating the distribution of chro- 
maticity vs. other moments. In some of the plots, 
the groups are clearly localised in specific parameter 
regions. Besides, the structure is more complex. 

Taking advantage of the structure identified on 
the COG distribution, we show in Fig. |S] the distri- 
bution of chromaticity vs. image RMS width (top 
panel), skewness (central panel), and skewness vari- 
ation (bottom panel) between the selected blue and 
red stars. The subsets are shown here with the same 
colours as in Fig.|3]and2] i.e. blue for subset 1, black 
for subset 2 and red for subset 3. 
The RMS width (top panel) and other even order 
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moments do not show an apparent structure, and 
most of them are not used in the neural processing. 
Odd order moments do not evidence directly the sub- 
set structure, as for the skewness, in the central panel 
of Fig. Besides, the distribution of chromaticity vs. 
skewness variation with spectral type (bottom panel) 
clearly shows clustering of the three subsets. This ap- 
pears a convenient choice for NN input, as it carries 
significant information. Similar effects are shown by 
other odd order moments. 

The NN input can therefore be defined in terms 
of the local instrument response, encoded in the nom- 
inal moments for a reference star, and the individual 
measurement moments. The COG of the reference 
object is the deviation of the image position with re- 
spect to an ideal system, and it is associated to the 
classical distortion. The other reference object inputs 
are the image RMS width, the third and fifth order 
moments. The inputs associated to the measured sig- 
nal, from an unknown type star, is a simple pair of 
values, i.e. the variation in the third and fifth order 
moments with respect to the known reference case. 
Also, taking advantage of the data structure dis- 
cussed above, we subtract the linear trend to the 
target (the chromaticity) in the training set. This 
pre-processing is supposed to ease the NN computa- 
tional load. The inverse transformation is applied to 
the output data on the test set. 

The training and test sets include respectively the 
data of 20000 and 5820 aberration instances, built 
accordingly to the above process. The histogram of 
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Figure 7. Residual chromaticity distribution. 



Chromaticity 


Input 


Residual 


Min. [/ib,s] 


-5975.4 


-49.1 


Mean [/lias] 


19.0 


-1.3 


Max. [//as] 


5590.8 


100.1 


RMS [nas] 


1641.8 


5.7 


Fraction in ±3<r[%] 


99.8 


98.0 



Table 3. Statistics over the test set of input and residual 
chromaticity, and fraction of instances within ±3tr . 



input chromaticity distribution in the test set (Fig. 
|nj is approximately Gaussian. 

4.3 Neural processing 

We use a sigmoidal NN with six inputs (four nomi- 
nal and two measured values), one output (the chro- 
maticity), and a single hidden layer with 300 units. 
The NN is optimised on the training set, and its per- 
formance is verified on the test set, as described in 
Sect. 01 

We use an incremental training, i.e. we split the 
training set in four subsets of 5000 examples. In the 
first training phase, the NN is trained by 1000 it- 
erations on the first subset, then we add the second 
data subset for additional 1000 iterations on the new 
compound set of 10000 examples, and so on until in- 
cluding the whole training set. The NN training on 
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Figure 8. NN performance: input/output characteristics. 



Offset a =1.30 ±0.07 
Slope b = 1.0002 ±0.5e- 4 



Table 4. Linear fit of NN output vs. input chromaticity. 



the complete data set is carried on for a total of 8000 
iterations, with monotonic decrease of the internal 
overall RMS error on the training set. 

The NN performance is evaluated on the test 
set; in particular, the discrepancy between the NN 
output (estimated chromaticity) and target (actual 
chromaticity for the test set data instances) can be 
considered as the residual chromaticity after correc- 
tion based on the NN results. The residual chro- 
maticity distribution (Fig. [7J is quite symmetric vs. 
zero, and the main statistical parameters are listed 
in Tab. [3] compared with the corresponding values in 
the input test set. We remark that 98% of the output 
data are within the ±3<r interval, vs. a corresponding 
fraction of 99.8% on the input. 

Since the goal is the computation of output val- 
ues coincident with the pre-defined target values, the 
characteristics, i.e. the relationship between input 
and output (plot shown in Fig.[HJ is ideally a straight 
line (y — a + bx) at angle tt/4, passing for the ori- 
gin, i.e. with parameters {a = 0, b = 1}. We com- 
pute the best fit parameters of the NN output vs. 
target distribution and their standard deviation; the 
results, shown in Tab. 31 are quite consistent with 
the expectations. 



5 CONCLUSIONS 

In this paper we use a neural network to diagnose 
and correct the chromaticity on astrometric measure- 
ments, in a framework consistent with the mission 
Gaia. The science data are efficiently encoded in a 
set of low order image moments. The NN, with 300 
internal nodes, is trained on a set of 20,000 data in- 
stances, and evaluated on a test set of 5820 cases. 

The NN diagnostics on the test set appears to 
be quite effective, as the RMS residual chromaticity, 
after correction based on NN results, is reduced by 
more than two orders of magnitude (factor ~ 280) 
with respect to the initial RMS value (Tab.|3J. 

Applying the network output for correction of 
the chromaticity on the elementary Gaia measure- 
ments, therefore, we may expect a significant reduc- 
tion of this source of systematic error; in particular, 
the residual chromaticity can be expected to be ran- 
dom, and possibly subject to further statistical aver- 
aging in subsequent measurements. A word of cau- 
tion is in order, however, due to the 1.3 fias residual 
offset. This may not be reduced as easily by simple 
measurements statistics, and it would be desirable 
that this was close to zero. Besides, it appears that 
the residual offset is related to the limited size of the 
training set, and the number of internal nodes, with 
respect to the large input chromaticity range. Also, 
it may be related to the mean chromaticity of the 
training set, close to 6(ias rather than zero. 
We expect that, increasing the training set and the 
number of nodes, the residual chromaticity offset will 
decrease. This will be part of the future develop- 
ments. Also, the sensitivity to measurement noise, as 
propagated to the image moments, will be subject of 
further investigations. 

From the current results, neural network diag- 
nostics for suppression of the chromatic errors on 
astrometric measurements appears to be a highly 
promising tool. 
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